Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column data type handling #5

Merged
merged 17 commits into from
May 3, 2018
Merged

Column data type handling #5

merged 17 commits into from
May 3, 2018

Conversation

bpintea
Copy link
Collaborator

@bpintea bpintea commented Apr 30, 2018

Once a query is executed, the client-app queries the driver for a set of details related to the results it got: how many columns it got, number of rows, types of the columns.
Column type is mapped onto a descriptor, IRD (Implementation Row Descriptor). The details available here are quite numerous: SQL type of data contained in the column, size ("precision"), decimal digits ("scale"), nullable, type name etc., as well as table [base] name, column alias a.s.o.

Generally with DBMSes, the data types can be either fixed SQL data types (ex. INTEGER) or derived, column-specific (like VARCHAR(10), FLOAT(precision, scale)). So, typically, a driver would need to receive from the data source a verbose description of the result set for every query.

However, ES/SQL only supports "fixed" data types currently (i.e. one can not "create" a table/index with specific, non-general types). So the driver takes advantage of this, by querying Elasticsearch for the data types when it first connects to and then cache these (per connection). Then, for each application-specific query result set, the driver will simply refer to the cached types when the app requests details about the columns data types. This is what this PR is mostly about.

Collaterally, the four column data-type concepts - size, dec-digits, octet-len and display size - are finished/fixed.

bpintea added 12 commits April 26, 2018 13:58
- make the warnings visible
- warn if no MSVC edition is found
The 'SYS TYPES' query is now executed right after connectivity check
part of SQLDriverConnect().
The result is cached on the connection (es_type array member, one
element for each type).
When a query is executed and the results processed, the column data type
sent by ES will be used to associate the right es_type element to the
record associated with the column.

This allows stripping the record structure from members that are only
used for data type characterization, thus saving memory, as well as
"transparency", in case new types are added or existing changed.
(Some change would still be needed in the driver, due to way ODBC
specifies "size" and "precision" of a column.)
When returning strings to the application, ODBC requires to include, but
don't count the 0-terminator. This fixes the counting part (it was
counted before)..
make use of cached data types when returning info to application, strip
them record structure.
s/x-pack-odbc/elasticsearch-sql-odbc/
Use cached ES return type for these record members, in case they belong
to an IRD descriptor.
Use the values provided by server in cached ES/SQL types to populate the
IRD type members (away from SQL C types). This cleans up the
implementation a bit and will allows copying of the descriptors later,
if needed.
- placed in a header, to easy find and change if needed
- this is a r/o IRD specific field => moved it to es_type;
- implement the Display Size calculation.
- fixed/finished the logic for these two measures
@bpintea bpintea requested review from droberts195 and edsavage April 30, 2018 19:11
@bpintea bpintea mentioned this pull request Apr 30, 2018
Closed
Copy link
Collaborator

@edsavage edsavage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me Bogdan although I would prefer that load_es_types was refactored to aid readability

driver/connect.c Outdated
goto end;
}

/* check row statues;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: statuses?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. (these rows are indeed not that monumental.)

driver/connect.c Outdated
}

/* check row statues;
* calculate the lenght of all strings (SQLWCHAR members) returned
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length.. (your favourite typo!)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. (just ran a grep, this typo is occurring 40+ times..)

driver/connect.c Outdated
@@ -1499,7 +2105,7 @@ SQLRETURN EsSQLGetConnectAttrW(
case SQL_ATTR_CURRENT_CATALOG:
DBGH(dbc, "requested: catalog name (@0x%p).", dbc->catalog);
#if 0
if (! dbc->conn) {
if (! dbc->es_types) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer that we didn't have unreachable code blocks such as this one. Could it safely be removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this block to remove/update the dead code.

* "manually" copy from a row structure (defined into the function) into the
* estype structure. The array of estype structs is returned.
*/
static BOOL load_es_types(esodbc_dbc_st *dbc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is very long! Is there any it could be broken into smaller, more easily digestible chunks?

An initial suggestion would be to move the macro and structure definitions out - although I do realise that this would break the scoping model you've used here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've broken the function down.
I didn't want initially to have another structure/type declared, even though I didn't like the length of the resulted function either. But I think you're ultimately right, having it smaller/readable is a better solution.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main comment is just a question. As long as it's been thought through that's fine.

driver/handles.c Outdated
* comment at function end), so these values must stay in sync, since there
* are no C corresponding defines for verbose and sub-code (i.e. nothing like
* "SQL_C_DATETIME" or "SQL_C_CODE_DATE").
* The identity does not hold across the bord, though (extended values, like
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: bord -> board

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.

#define ESODBC_SQL_NULL 0
#define ESODBC_SQL_UNSUPPORTED 1111
#define ESODBC_SQL_OBJECT 2002
#define ESODBC_SQL_NESTED 2002
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also geo and specialised types.

Do they get converted to "unsupported" on the Java side?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the specialised types are sent as UNSUPPORTED currently, yes (at least the IP type is). Not sure where geo would fit, but I guess it's treated similarly.

bpintea added 5 commits May 2, 2018 12:57
- mostly s/lenght/length
The sign-explicit defs have been added to replace the implicitly signed
ones.
Update the logic that gets the catalog name: there's a new SYS CATALOGS
available system query that can be used, so the way to implement it in
the driver is now clear.
BIGINT is a new type, which are all explicitely signed/unsigned.
(not sure how it slipped in previous commit, past a pre-commit compile)
load_es_types() got too large.
@bpintea bpintea merged commit d4127f1 into elastic:master May 3, 2018
@bpintea bpintea deleted the feature/column_data_type branch May 3, 2018 11:55
bpintea added a commit that referenced this pull request Jun 4, 2018
@bpintea bpintea added >feature Applicable to PRs adding new functionality v6.5.0 labels May 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature Applicable to PRs adding new functionality v6.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants